A Filter Feature Selection Method for Clustering
نویسندگان
چکیده
High dimensionnal data is a challenge for the KDD community. Feature Selection (FS) is an efficient preprocessing step for dimensionnality reduction thanks to the removal of redundant and/or noisy features. Few and mostly recent FS methods have been proposed for clustering. Furthermore, most of them are ”wrapper” methods that require the use of clustering algorithms for evaluating the selected features subsets. Due to this reliance on clustering algorithms that often require parameters settings (such as number of clusters), and due to the lack of a consensual suitable criterion to evaluate clustering quality in different subspaces, the wrapper approach can not be considered as a universal way to perform FS within the clustering framework. Thus, we propose and evaluate in this paper a ”filter” FS method. This approach is consequently completely independent of any clustering algorithm. It is based upon the use of two specific indices that allow to assess the adequacy between two sets of features. As these indices exhibit very specific and interesting properties as far as their computational cost is concerned (they just require one dataset scan), the proposed method can be considered as an effective method not only from the point of view of the results quality but also from the executing time point of view...
منابع مشابه
Fuzzy-rough Information Gain Ratio Approach to Filter-wrapper Feature Selection
Feature selection for various applications has been carried out for many years in many different research areas. However, there is a trade-off between finding feature subsets with minimum length and increasing the classification accuracy. In this paper, a filter-wrapper feature selection approach based on fuzzy-rough gain ratio is proposed to tackle this problem. As a search strategy, a modifie...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملA New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملFeature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine
Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملDeveloping a Filter-Wrapper Feature Selection Method and its Application in Dimension Reduction of Gen Expression
Nowadays, increasing the volume of data and the number of attributes in the dataset has reduced the accuracy of the learning algorithm and the computational complexity. A dimensionality reduction method is a feature selection method, which is done through filtering and wrapping. The wrapper methods are more accurate than filter ones but perform faster and have a less computational burden. With ...
متن کامل